智能论文笔记

An Empirical Study of IoT Security Aspects at Sentence-Level in Developer Textual Discussions

Nibir Chandra Mandal , Gias Uddin

分类：机器学习

2022-06-07

物联网是一个快速新兴的范式，现在几乎涵盖了我们现代生活的各个方面。因此，确保物联网设备的安全至关重要。物联网设备与传统计算可能有所不同，从而在物联网设备中设计和实施适当的安全措施可能具有挑战性。我们观察到，物联网开发人员在堆栈溢出（SO）等开发人员论坛中讨论了与安全相关的挑战。但是，我们发现，在SO中，物联网安全讨论也可以埋葬在非安全性讨论中。在本文中，我们旨在了解物联网开发人员在将安全实践和技术应用于IoT设备时面临的挑战。我们有两个目标：（1）开发一个模型，该模型可以自动在SO中找到与安全有关的物联网讨论，并且（2）研究模型输出以了解与IoT开发人员安全相关的挑战。首先，我们从中下载了53k帖子，因此包含有关物联网的讨论。其次，我们手动将53K帖子的5,919个句子标记为1或0。第三，我们使用此基准测试来研究一套深度学习变压器模型。最佳性能模型称为SECBOT。第四，我们将SECBOT应用于整个帖子，并找到大约30K安全性的句子。第五，我们将主题建模应用于与安全有关的句子。然后，我们标记并分类主题。第六，我们分析了主题的演变。我们发现（1）SECBOT是基于深度学习模型Roberta的重建。 SECBOT提供的最佳F1分数为0.935，（2）SECBOT错误分类的样本中有六个错误类别。当关键字/上下文是模棱两可的（例如，网关可以是安全网关或简单网关）时，SECBOT主要是错误的，（3）有9个安全主题分为三个类别：软件，硬件和网络，以及（4）最多的主题属于软件安全性，然后是网络安全。

translated by 谷歌翻译

An Empirical Study of the Effectiveness of an Ensemble of Stand-alone Sentiment Detection Tools for Software Engineering Datasets

Gias Uddin , Yann-Gael Gueheneuc , Foutse Khomh , Chanchal K Roy

分类：机器学习

2021-11-04

软件工程（SE）中的情感分析表明了承诺分析和支持各种发展活动。我们报告了经验研究的结果，以确定我们通过组合独立的SE特定情绪探测器的极性标签来确定开发集合发动机的可行性。我们的研究有两个阶段。在第一阶段，我们通过Lin等人从最近发表的两篇论文中选择了五个特定的情绪检测工具。 [31,32]，谁首先报告了独立的情绪探测器的负面结果，然后提出了改进的SE特异性情绪检测器，POME [31]。我们向第17,581个单位（句子/文件）报告来自六个目前可用情绪基准的17,581个单位（句子/文件）。我们发现现有工具可以在85-95％的情况下互补，即，一个是错误的，但另一个是对的。然而，这些工具的大多数基于投票的集合未能提高情绪检测的准确性。我们通过将极性标签和单词袋作为特征组合来开发Sentisead，一个受监督的工具。 Sentisead将各个工具的性能（F1分数）提高了4％（Over Senti4SD [5]） - 100％（通过Pome [31]）。在第二阶段，我们使用预先培训的变压器模型（PTM）进行比较和改进Sentisead基础架构。我们发现，带Roberta的Sentisead基础架构作为来自Lin等人的五个独立规则和浅学习的SE特定工具的集合。 [31,32]在六个数据集中提供0.805的最佳F1分数，而独立罗伯塔显示F1分数为0.801。

translated by 谷歌翻译

Pre-trained Language Models for Keyphrase Generation: A Thorough Empirical Study

Di Wu , Wasi Uddin Ahmad , Kai-Wei Chang

分类：自然语言处理

2022-12-20

Neural models that do not rely on pre-training have excelled in the keyphrase generation task with large annotated datasets. Meanwhile, new approaches have incorporated pre-trained language models (PLMs) for their data efficiency. However, there lacks a systematic study of how the two types of approaches compare and how different design choices can affect the performance of PLM-based models. To fill in this knowledge gap and facilitate a more informed use of PLMs for keyphrase extraction and keyphrase generation, we present an in-depth empirical study. Formulating keyphrase extraction as sequence labeling and keyphrase generation as sequence-to-sequence generation, we perform extensive experiments in three domains. After showing that PLMs have competitive high-resource performance and state-of-the-art low-resource performance, we investigate important design choices including in-domain PLMs, PLMs with different pre-training objectives, using PLMs with a parameter budget, and different formulations for present keyphrases. Further results show that (1) in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models; (2) with a fixed parameter budget, prioritizing model depth over width and allocating more layers in the encoder leads to better encoder-decoder models; and (3) introducing four in-domain PLMs, we achieve a competitive performance in the news domain and the state-of-the-art performance in the scientific domain.

translated by 谷歌翻译

PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English

Jianfeng Chi , Wasi Uddin Ahmad , Yuan Tian , Kai-Wei Chang

分类：自然语言处理

2022-12-20

Privacy policies provide individuals with information about their rights and how their personal information is handled. Natural language understanding (NLU) technologies can support individuals and practitioners to understand better privacy practices described in lengthy and complex documents. However, existing efforts that use NLU technologies are limited by processing the language in a way exclusive to a single task focusing on certain privacy practices. To this end, we introduce the Privacy Policy Language Understanding Evaluation (PLUE) benchmark, a multi-task benchmark for evaluating the privacy policy language understanding across various tasks. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific pre-training offers performance improvements across all tasks. We release the benchmark to encourage future research in this domain.

translated by 谷歌翻译

CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context

Yangruibo Ding , Zijian Wang , Wasi Uddin Ahmad , Murali Krishna Ramanathan , Ramesh Nallapati , Parminder Bhatia , Dan Roth , Bing Xiang

分类：自然语言处理

2022-12-20

While pre-trained language models (LM) for code have achieved great success in code completion, they generate code conditioned only on the contents within the file, i.e., in-file context, but ignore the rich semantics in other files within the same project, i.e., cross-file context, a critical source of information that is especially useful in modern modular software development. Such overlooking constrains code language models' capacity in code completion, leading to unexpected behaviors such as generating hallucinated class member functions or function calls with unexpected arguments. In this work, we develop a cross-file context finder tool, CCFINDER, that effectively locates and retrieves the most relevant cross-file context. We propose CoCoMIC, a framework that incorporates cross-file context to learn the in-file and cross-file context jointly on top of pretrained code LMs. CoCoMIC successfully improves the existing code LM with a 19.30% relative increase in exact match and a 15.41% relative increase in identifier matching for code completion when the cross-file context is provided.

translated by 谷歌翻译

A Dependable Hybrid Machine Learning Model for Network Intrusion Detection

Md. Alamin Talukder , Khondokar Fida Hasan , Md. Manowarul Islam , Md Ashraf Uddin , Arnisha Akhter , Mohammand Abu Yousuf , Fares Alharbi , Mohammad Ali Moni

分类：机器学习

2022-12-08

Network intrusion detection systems (NIDSs) play an important role in computer network security. There are several detection mechanisms where anomaly-based automated detection outperforms others significantly. Amid the sophistication and growing number of attacks, dealing with large amounts of data is a recognized issue in the development of anomaly-based NIDS. However, do current models meet the needs of today's networks in terms of required accuracy and dependability? In this research, we propose a new hybrid model that combines machine learning and deep learning to increase detection rates while securing dependability. Our proposed method ensures efficient pre-processing by combining SMOTE for data balancing and XGBoost for feature selection. We compared our developed method to various machine learning and deep learning algorithms to find a more efficient algorithm to implement in the pipeline. Furthermore, we chose the most effective model for network intrusion based on a set of benchmarked performance analysis criteria. Our method produces excellent results when tested on two datasets, KDDCUP'99 and CIC-MalMem-2022, with an accuracy of 99.99% and 100% for KDDCUP'99 and CIC-MalMem-2022, respectively, and no overfitting or Type-1 and Type-2 issues.

translated by 谷歌翻译

Can Ensemble of Classifiers Provide Better Recognition Results in Packaging Activity?

A. H. M. Nazmus Sakib , Promit Basak , Syed Doha Uddin , Shahamat Mustavi Tasin , Md Atiqur Rahman Ahad

分类：计算机视觉 | 机器学习

2022-11-05

Skeleton-based Motion Capture (MoCap) systems have been widely used in the game and film industry for mimicking complex human actions for a long time. MoCap data has also proved its effectiveness in human activity recognition tasks. However, it is a quite challenging task for smaller datasets. The lack of such data for industrial activities further adds to the difficulties. In this work, we have proposed an ensemble-based machine learning methodology that is targeted to work better on MoCap datasets. The experiments have been performed on the MoCap data given in the Bento Packaging Activity Recognition Challenge 2021. Bento is a Japanese word that resembles lunch-box. Upon processing the raw MoCap data at first, we have achieved an astonishing accuracy of 98% on 10-fold Cross-Validation and 82% on Leave-One-Out-Cross-Validation by using the proposed ensemble model.

translated by 谷歌翻译

Analysis and prediction of heart stroke from ejection fraction and serum creatinine using LSTM deep learning approach

Md Ershadul Haque , Salah Uddin , Md Ariful Islam , Amira Khanom , Abdulla Suman , Manoranjan Paul

分类：计算机视觉 | 机器学习

2022-09-28

大数据和深度学习的结合是一项破坏世界的技术，如果正确使用，可以极大地影响任何目标。随着深度学习技术中大量医疗保健数据集和进步的可用性，系统现在可以很好地预测任何健康问题的未来趋势。从文献调查中，我们发现SVM用于预测心力衰竭的情况，而无需关联客观因素。利用电子健康记录（EHR）中重要历史信息的强度，我们利用长期记忆（LSTM）建立了一个智能和预测的模型，并根据该健康记录预测心力衰竭的未来趋势。因此，这项工作的基本承诺是使用基于患者的电子药用信息的LSTM来预测心脏的失败。我们已经分析了一个数据集，该数据集包含在Faisalabad心脏病学研究所和Faisalabad（巴基斯坦旁遮普邦）的盟军医院收集的299例心力衰竭患者的病历。这些患者由105名女性和194名男性组成，年龄在40岁和95岁之间。该数据集包含13个功能，这些功能报告了负责心力衰竭的临床，身体和生活方式信息。我们发现我们的分析趋势越来越多，这将有助于促进心中预测领域的知识。

translated by 谷歌翻译

Digital Twin in Safety-Critical Robotics Applications: Opportunities and Challenges

Sabur Baidya , Sumit K. Das , Mohammad Helal Uddin , Chase Kosek , Chris Summers

分类：机器人

2022-09-26

数字双技术被认为是现代工业发展的组成部分。随着技术Internet技术（IoT）技术的快速发展以及自动化趋势的增加，虚拟世界与物理世界之间的整合现在可以实现生产实用的数字双胞胎。但是，数字双胞胎的现有定义是不完整的，有时是模棱两可的。在此，我们进行了历史审查，并分析了数字双胞胎的现代通用观点，以创建其新的扩展定义。我们还审查并讨论了在安全至关重要的机器人技术应用中数字双胞胎中现有的工作。特别是，由于环境挑战，数字双胞胎在工业应用中的使用需要自动和远程操作。但是，环境中的不确定性可能需要对机器人进行仔细监控和快速适应，这些机器人需要防止安全和成本效益。我们展示了一个案例研究，以开发针对安全至关重要的机器人臂应用框架，并提出系统性能以显示其优势，并讨论未来的挑战和范围。

translated by 谷歌翻译

Edge-assisted Collaborative Digital Twin for Safety-Critical Robotics in Industrial IoT

Sumit K. Das , Mohammad Helal Uddin , Sabur Baidya

分类：机器人

2022-09-26

Digital Twin Technology在现代工业发展中起着关键作用。尤其是，随着技术的技术进步（IoT）以及自主权的日益增长的趋势，配备多传感器的机器人技术可以创建实用的数字双胞胎，这在运营，维护和安全的工业应用程序中特别有用。在此，我们演示了一个现实世界中的数字双胞胎，其中包括安全至关重要的机器人应用程序，并带有Franka-Emika-Panda机器人臂。我们开发并展示了一个避免动态障碍物的边缘辅助协作数字双胞胎，这对于在工业物联网中不确定和动态的环境中运行时可以实时适应机器人。

translated by 谷歌翻译